The effects of transcription factor competition on gene regulation. 



Nicolae Radu Zabet 1,2 '* and Boris Adryan 1 



2,t 



1 Cambridge Systems Biology Centre, University of Cambridge, Tennis Court Road, Cambridge CB2 1QR, UK 
2 Dcpartmcnt of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK 
'Email: n.r.zabet@gen.cam.ac.uk ^Email: ba255@cam.ac.uk 



o 



(N 



U 
6 

I 

cr 



> 
m 

t> 

cn 
o 



13 



Abstract 

We performed stochastic simulations of transcription factor (TF) molecules translocating by facilitated 
diffusion (a combination of 3D diffusion in the cytoplasm and ID random walk on the DNA), and consider 
various abundances of cognate and non-cognate TFs to assess the influence of competitor molecules that 
also move along the DNA. We show that molecular crowding on the DNA always leads to longer times 
required by TF molecules to locate their target sites as well as to lower occupancy, which may confer a 
general mechanism to control gene activity levels globally. Finally, we show that crowding on the DNA may 
increase transcriptional noise through increased variability of the occupancy time of the target sites. 

1 Introduction 

Transcri ption factors (TF) are DNA-binding proteins that regulate gene activity by binding to specific sites on 
the DNA lRiggs et al.l <|l970h observed that the association rate of the lac repressor (a bacterial TF) to its target 
site is much faster than predicted by simple 3D diffusion. It was later proposed that the mechanism by which 
TF molecules locate their target sites ass umes a combination of 3D diffusion and ID random walk on the DNA, 
which is often called facilitated diffusion ( Berg et ail . 1981 ; Halford and Marko . 2004 ). Their rationale was that 
the speed-up in target site finding is achieved by reducing the di mensionality of the s earch process (from 3D 
to ID ). The existence of facilitated diffusion was proven in vitro (jKabata et all 119931 ) and in vivo (jElf et al 
20071) . 



Following this initial work, a large number of studies investigated the search process and described the 
effects that various factors have on the speed at which TFs locate their target sites. With a few exceptions, these 
studies considered the case of molecules performing the search process on naked DNA, without any competitor 
species. It is clear that this is an approximation that needs further investigation, because other proteins, 
including other TFs, are translocating on the DNA at the same time. In fact, the proportion of inaccessible 
DNA is high; f or example, between 10 % and 50% of the E.coli DNA is bound by other proteins (which we call 
'non-cognate') (jFlvvbierg et all I2006I ). 

The question that we address in this manuscript is: how does the presence of other molecules on the DNA 
influence TF target site finding and binding? In particular, we are interested in describing both the distribution 
of the association rate to the target site and the distribution of the proportion of time the target site is occupied. 

There is the notion that crowding on the DNA can have two opposing ef f ects: (i) reducing the amount 
of DNA that needs to be 'scanned' by covering non-specific sites ( Mirny et aL . 2009|) and (i i ) incr easing the 
probability that the target site is already covered by non-cognate molecules ( Flvvbierg et alll2006l ). In other 
words, by increasing the abundance of non-cognate molecules, the amount of DNA that needs to be scanned 
is reduced, but, at the same time, the probability that the target site is occupied by a non-cognate molecule 
increases. This scenario suggests that there may be an optimal level of DNA occupancy that can increase the 
search speed. 

Muruganl (j201fj proofed the existence of an optimal amount of crowding analytically, but their approach 



contained approximations that could introduce biases in the final results. One of their assumptions was that 
the sliding length is inversely proportional to the number of molecules bound to the DNA, which is true only 
if a bound molecule perfor ms just ID random walks and does not hop or jum p, which are co mmonly accepted 
modes of TF translocation ( Bonnet et al. . 20081 : Wunderlich and Mirny , 2008 ). Furthermore, Muruganl (2010) 
disregarded the fact that the non-specific association rate is decreased when the DNA is occupied by other 
molecules and that the target site can also be occupied by non-cognate molecules. 



1 



When these aspects are taken into account, iLi et al. (2009) showed that the time to locate the tar- 



get site always increases with increasing amount s of cr owding on the DNA. However, aforementioned studies 
(jFlvvbierg et al. . l2006t iMurueail |2010| ; ILi et all 120091 ) assumed that the proteins bound to the DNA act as 
fixed obstacles, i.e. they do not move on the DNA. This approximation needs further analysis, because the 
non-cognate TF molecules will display similar dynamic behaviour as the cognate TFs under investigation. 

Here, we use stochastic simulati ons to address these open q uestions, using our previously established 
theoretical model and implementation (jZabet and Adrvarl l2012cl laf). Our approach allows to measure various 
parameters (such as the arrival time or the proportion of time a target site is occupied), while explicitly 
representing all the molecules in the system and their dynamic behaviour. Using a well-characterised TF and 
its best known binding site as model, our results indicate that the average time lac repressor (lad) requires to 
locat e the 0\ site increases by adding non-cognate molecules, thus supporting the result of Li et al. (|Li et al. . 
2009). We also looked at the effects of crowding on the variation in arrival times and found that there the change 



in the noise component of the arrival time is negligible when crowding is varied within biological plausible limits. 

A related question concerns the effect of crowding on the time a target site is occupied. Using the same 
simulation framework, we measured the time the 0\ site was occupied by a lad molecule within the E.coli cell 
cycle. The results show that crowding decreases the average target site occupancy, while at the same time, 
increases the variation in occupancy. This result suggests that transcriptional noise can, in part, be accounted 
to the inherent crowding of molecules on the DNA. This result is supported by recent experimental evidence 
that non-cognate TFs contribute to gene ex pression noise, and one mechanism to reduce this noise consists in 



insulating the target site with cognate TFs (jSasson et all I2012I ). 



In the case of mobile obstacles, the effects of crowding on search time and the proportion of time the 
target is occupied are relatively low for biologically plausible crowding levels. To investigate whether this result 
is influenced by our assumption that the non-cognate TFs are mobile, we also considered the case of immobile 
obstacles and found that the outcome is identical, but the change in search time and occupancy of the target 
site is more dramatic. This result can be explained by the fact that barriers are formed on the DNA leading t o 
a reduction in the time to find (or return) to the target site ( Ruusala and Crothersl . [l992t Hammar et al. . 2012 ). 
For the occupancy of the target site, the increase in return time can be partially compensated by keeping the 
molecule constrained in the vicinity o f the target site (w e initially observe a small increase in occupancy with 



increasing crowding); as proposed in dWang et all l2012h . However, this compensation effect becomes rapidly 



negligible with increasing crowding on the DNA, due to the dramatic increase in the time required to return to 
the target site. 



2 Materials and Methods 



We performed stochastic s imulations using a computational framework and a set of parameters presented in 
( Zabet and Adrvan , 2012d la). Briefly, the model represents explicitly all molecules in the sys tem and allows 
to pe rform event driven stochastic simulations of the dynamics of the molecules in the system ([Gillespie! . 1976, 
19771 ). The 3D diffusion is modelled implicitly by using the Master Equat ion, which was s hown to be an accurate 



approximation when simulating binding of TFs molecules to the DNA ( van Zon et al. . 20061 ). The amount of 



time a molecules spends at a certain position on the DNA is a random number ex ponentially distributed with 



an average which is det ermined based on the binding energy (jGerland et all 120021 ). here, approximated by the 



position weight matrix ( Stormol . 2000l) . Once the amount of time spent at one position expires, the molecule 
can slide to a nearby pos ition, hop on the DNA or u nbind from the DNA with certain probabilities which were 
previously estimated in (jZabet and Adrvarl [2012ah . Finall y, steric hindrance is i mplemented by not allowing 
two molecules to cover the same base pair simultaneously ( Hermsen et al. . 20061 ) 
the existence of two TF species: a cognate (lac repressor in our case) and a non-cognate one 
associated with the lac repressor, including its specificty expressed as position weight matrix, can be found in 
appendix 



In our system, we assume 
The parameters 



In (jZabetl . l2012h we showed that it is sufficient to simulate the target finding process using a smaller 
(> 100 Kbp) region of DNA, provided that the parameters of the subsystem are adequately scaled. In particular, 
we found that there are two methods (the copy number model and the association rate model), which can be 
applied to adjust the parameters and that the copy number model can be used for highly abundant TFs (such 
as the non-cognate TFs in this case), while the association rate model for lower abundant TFs (lad in this 
case). 

To simulate non-cognate crowding we considered the following abundances for these TFs: (i) 0, (ii) 10 4 , 
(Hi) 3 x 10 4 , (iv) 5 x 10 4 and (v) 7 x 10 4 molecules. The para meters of the non-cognate TFs were the same as 



used in previous work (IZabet and Adrvanl . l2012at IZabetl . 120121) , except the association rate which was set to the 
values listed in Table [T] This abundance of non-cognate TFs, the corresponding association rates and the fact 



2 



± ^nc 


kassoc „— 1 


covered DNA 


1 f nc 


^llacl S 


"aoiaci s 


^lOOlacI s 


^lOOOlacI S 


n 
u 


1 son 


u/o 


n 
u 


A 1 Q 


A O/l 


A 1 1 


A 1 Q 


10000 


2000 


9% 


216 


4.58 


4.63 


4.67 


4.74 


30000 


2571 


26% 


647 


6.11 


6.10 


6.19 


6.32 


50000 


3600 


42% 


1078 


8.63 


8.76 


8.73 


8.88 


70000 


6000 


55% 


1509 


13.15 


13.05 


13.06 


13.26 



Table 1: Sub-system parameters for various non-cognate molecule abundances. The overbar is used to denote 
the corresponding parameters in the subsystem, e.g. TF nc represents the abundance of non-cognate TFs in the 
100 Kbp subsystem. 



that each molecules covers 46 bp of DNA lead to vari ous percentages of DN A being covered, which resides in 
the range of biological plausible values of 10% to 50% ( Flvvbierg et al. . 20061 ) (except in the case of TF nc = 0); 
see Table [1] Note that these parameters that lead to various de grees of crowding on the DNA are similar to the 
ones presented in the Supplementary Material of ( Zabet . 20121 ). 

For each set of parameters, we perform ed 20 simulations, each running for 3000 s, which is approximately 
the E.coli cell cycle ( Rosenfeld et all [20051 . To increase simulation speed, we selected a 100 K bp region of 
DNA which contained the <D\ site (nucleotides 300,000 - 400,000 in the E.coli K-12 genome) (jRilev et al. . 
2006) . Since the non-cognate TFs are highly abundant, we applied the copy number model and obtained the 
corresponding abundances of non-cognate TFs (TF nc ) for use in the subsystem, as listed in Table [T] 

In addition to non-cognate TFs, the system also consists of the cognate lad molecules. We considered 
several lad abundances: (i) 1, (ii) 10, (Hi) 100 and (iv) 1000 molecules. As well as in the case of non- cognate 
TFs, we used the same parameters for lad as in previous work ( Zabet and AdrvanL 2012a : Zabet, 2012 ). In the 
case of the full system we considered an association rate of fewf c = 2400 s _1 . When we applied the association 
rate model to reduce the system to 100 Kbp, we obtained the values of the association rate corresponding to 
each case which are listed in Table [TJ 

Finally, we also considered the case of pseudo-immobile non-cognate (also denoted as inc). These molecules 
do not perform any ID random walk on the DNA, but only associate to the DNA bind at a random position 
and stay there for a large amount of time. They have an average waiting time on the DNA of 30000 s, which 
results in almost immobile obstacles on the DNA. Since we perform stochastic simulations, in the rare event 
of of one of the molecules leaving their site, they get repositioned into the cytoplasm, from where they can 
again bind to the DNA at a random position. The full list of parameters for the pseudo-immobile non-cognate 
is listed in appendix [21 We also allow immobile non-cognate TFs to cover the 0\ site, which would exclude 
lad molecules indefinitely from the 0\ site. Thus, we perform 100 simulations for each set of parameters and 
simulations where the target site is never reached are discarded. 

In the case of immobile obstacles, we also consider the case of 40000 copies of non-cognate TFs. This 
was justified by the fact that, in the case of immobile obstacles, due to high residence time of the non-cognate 
TF molecules to the DNA, the percentage of covered DNA by molecules was higher than in the case of mobile 
obstacles. For 40000 copies of non-cognate immobile molecules, 40% of the DNA was covered by DNA binding 
molecules, which is similar with the crowding level observed in the case of 50000 copies of non-cognate mobile 
molecules. When we applied the copy number model and the association rate model to reduce the system to 

, r , . . , * —assoc —assoc —assoc —assoc 

100 Kbp, we obtained the following values: (i) 1 b nc = 863 and (ii) k ix&cl = fc 10 i ac i = ^looiaci = ^loooiaci = 7.37. 
Note that in the case of immobile obstacles, the association rate affects the results negligible as long as the 
binding to the DNA is fast compared to the amount of time spent bound to the DNA. 



3 Results 

3.1 Time to locate the target site. 

First, we wanted to understand how crowding influences the association rate of a TF to its target site. FigureQ] 
shows the arrival times of the first lad molecule to the 0\ site for various abundances of non-cognate TFs and 
lacl. Figure Ha) considers the case of 1 lad molecule in the cell and several levels of crowding on the DNA and 
shows that, by increasing the amount of crowding, the arrival times always increase, but there is negligeable 
change in the variance of the search time in the range of biological relevant crowding levels on the DNA (also 
see appendix IB1 . 

One solution to reduce the noise in gene regu lation is to increase the abundance of cognate TFs ( PaulssonL 



20051 : iBar-Even et~all . 120061: IZabet and Chul |201C() . When the abundance of lacl is increased, both the arrival 



3 



1 lacl molecule 



B 



10 lacl molecules 



o 
o 
lo 



o 
o 



o 



LO 
CM 




0.89 

i 1 1 1 r 

% 9 % 26 % 42 % 55 % 



o 

LO 



o 
lo 



LO 

o 




0.95 

i 1 1 1 r 

% 9 % 26 % 42 % 55 % 



100 lacl molecules 



D 



1000 lacl molecules 



o 
+ 

CD 



O 
I 

CD 



CO 
O 




% 9 % 26 % 42 % 55 % 



% of covered DNA 




i i i i r 

% 9 % 26 % 42 % 56 % 
% of covered DNA 



Figure 1: The average time for the TF to reach the target site (measured in seconds) as a function of DNA 
crowding in the case of mobile obstacles. Note the differences between scales of the y-axis, e.g. for 1 lacl 
molecule it takes in the range of tens of minutes to locate the target site, while for 100 copies the search time 
is in the range of seconds. The number in the inset represents the Pearson coefficient of correlation between 
crowding and the mean search time. The values indicate the crowding is highly correlated with the search time, 
in the sense that higher crowding on the DNA leads to higher search times. In the case of mobile obstacles, the 
search times are unimodal distributed and, thus, we plotted the data using boxplots; see appendix iBl 



4 



time and its variability are reduced; e.g. compare F igure Qla) to Figure [TT&) . F or low abundant TFs (1 — 10 



copies per cell) , which are common in bacterial cells ( Wunderlich and Mirny , 2009() , the variability of the arrival 
rate is significant. 

Next, we wanted to confirm that the results o f our simulations are in accordance with previous experimental 
studies. For example, Elf et al. (jElf et al 1 120071) found that the time of 1 lad molecule to locate the 0\ site 
is m 354 s. For 10 molecules of lad (which is the endogenous levels of lad in E.coli) the search time will be 
ten times faster, m 35 s. Figure shows that in our simulations 10 lad molecules can locate the 0\ site 

on average within similar times, but only for a degree of crowding between 9% «T 09 ) = 35.84 s) and 26% 
((T ' 26 ) = 35.52 s). If there is no competition on the DNA, the time is shorter «T°) = 30.32 s), while for 
higher levels of crowding the time is higher ((T 42 ) = 43.99 s and (T - 55 ) = 52.05 s). This confirms that the 
system was correctly parametrized. 

In this manuscript, we considered the case of moving obstacles on the DNA and identified th at by increasin g 



the abundance of non-cognate TFs increases the arrival time to the DNA, as proposed in (|Li et all [2009). 
However, the increase in search time is not as dramati c as expected fr om previous studies. One difference 



between our model and previous models ( Li et al. . 20091 MurueanL 201Clh is that we assume mobile obstacles, 
while the previous models assumed immobile obstacles. Thus, we also investigated if the assumption of mobile 
obstacles affected our results. Consequently, we also perfor med a series of s imulations where we considered the 
non-cognate TFs to be immobile obstacles on the DNA as in (|Li et all lioblh . The description of this TF species 



can be found in the methods section and the full list of parameters in appendix |XJ 

Figure [5] confirms that the search time increases when the crowding on the DNA is increased. However, 
for crowding levels between 30% and 50% the arrival times display a bimodal distribution. One explanation 
for this result is that in certain simulations t he obstacles bind in the vicinity of the Q\ sit e from where they 
create a barrier effect ( Ruusala and Cr otherd. 119921: 15" ammar et all 12012; Wa ng et all 120121 ). This means that, 
in a subset of the simulations, the search is significantly slower due to the barrier effect, while in the rest of the 
simulations the search is faster due to lack of barriers. In the case when the obstacles do not create a barrier 
effect, the influence that the level of crowding has on the speed of the search process is limited. 

Previous studies as the ones of Li et al. (|2009h and iMuruganl (|2010l) investigated the mean search time 
under the assumption of immobile obstacles. Here we show that, in the case of immobile obstacles, the search 
time displays a bimodal distribution and, thus, the mean search time cannot be used as an measure for the 
arrival time. 

Finally, comparing the case of moving obstacles to the case of immobile obstacles, one can notice that 
overall, the search process is faster in the case of mobile obstacles; compare the y-axis of the Figures [1] and [21 
Hence, in the case of mobile obstacles, slowdown in the search process caused by crowding on the DNA is only 
marginal, while in the case of fixed obstacles there is a stronger effect. 



3.2 Proportion of time the target site is occupied. 

The second aspect we were interested in is the prop ortion of time the t arget site is occupied by cognate TFs, 



as this may have direct influence on gene expression. ISasson et all (]2012l ) found that binding sites of genes that 



are occupied for shorter amounts of time display a larger degree of gene expression noise compared to binding 
sites that are occupied for longer times by cognate TFs. They attributed this noise to the fact that cognate TF 
molecules can 'insulate' the target site from non-cognate TF molecules. We wanted to verify the validity of this 
assumption and, thus, we measured the fraction of time the target site is occupied during stochastic simulation 
of the facilitated diffusion mechanism. 

Figure [3] shows that, independent of lad abundance, molecular crowding on the DNA reduces the aver- 
age occupancy of the target site, as previously proposed bv IWasson and Hartemink ( 20091 ). This means that 



crowding on the DNA can control gene expression levels at a global level. In the case of activating TFs, the 
increase in DNA-binding protein copy numbers may lead to a reduction in gene expression. 

Furthermore, this reduction in the average occupancy also introduces a larger degree of variability that 
can be observed at target sites. This higher variability, in conjunction with the lower occupancy of the target 
site, results in an amplified increase of the noise in the gene regulation process; see Figure [3] and appendix [C] 
Similar to the case of binding time to the target site, one method to reduce the noise levels in occupancy of 
the target site is to increase the abundance of the cognate TF (lad in our case). Our results confirm that the 
increase in the noise levels generated by crowding can be compensated by an increase in lad copy number. 

Finally, we considered again the case of immobile obstacles and measured the occupancy of the 0\ site. 
Figure H] displays an unexpected effect. By increasing the crowding, the occupancy of the target site increases 
as well, but this effect stops when the DNA is 10% covered by molecules and afterwards the reverse effect is 



5 



1 lacl molecule 



B 




i i i i i i 

% 10 % 30 % 40 % 50 % 69 % 



10 lacl molecules 




i i i i i i 

% 10 % 30 % 40 % 50 % 69 % 



100 lacl molecules 



D 



1000 lacl molecules 




— i 1 1 1 1 1 — 

% 10 % 30 % 40 % 50 % 69 % 
% of covered DNA 




— i 1 1 1 1 1 — 

% 10 % 30 % 40 % 50 % 70 % 

% of covered DNA 



Figure 2: The average time for the TF to reach the target site (measured in seconds) as a function of DNA 
crowding in the case of immobile obstacles. For each set of parameters, we performed 100 simulations. Note that 
the amount of covered DNA is higher than in the case of mobile obstacles, due to the fact that the molecules 
spend more time bound to the DNA and the association rate is the same for both mobile and immobile ob stacles. 
To test i f the arrival times are unimod al distributed, we used the R implementation (jMaechlerl . 120121 ) of the 
Dip test (jHartigan and Hartiganl ll985T) and, in the case of immobile obstacles, we found that the distribution 
of the arrival times is bimodal; see appendix [Cj 



G 



-0.91 



1 lacl molecule 




i i i i i 

% 9 % 26 % 42 % 55 % 



10 lacl molecules 




o % 



9 % 



26 % 



42 % 



55 % 



100 lacl molecules 



1000 lacl molecules 




CO 

d 




% of covered DNA 



% 9 % 26 % 42 % 
% of covered DNA 



56 % 



Figure 3: Proportion of time the target site is occupied (y-axis) as a function of DNA crowding (x-axis) in 
the case of mobile obstacles. The number in the inset represents the Pearson coefficient of correlation between 
crowding and the proportion of time the Oi site is occupied. The values indicate that crowding is highly anti- 
correlated with the proportion of time the target site is occupied, in the sense that higher crowding on the DNA 
leads to lower occupancy of the target site by cognate TFs. The occupancy of the target site by cognate TFs is 
unimodal in the case of mobile obstacles and, thus, we represented the data using boxplots; see appendix [Cl 



7 



1 lacl molecule 



o 
o 



o 
o 




"i 1 1 1 r 

% 10 % 30 % 40 % 50 % 



B 



10 lacl molecules 



o 
d 




69 % 



"i 1 1 1 1 r 

% 10 % 30 % 40 % 50 % 69 % 



100 lacl molecules 



o 

C\J 

o 



o 
o 




i 1 1 1 1 r 

% 10 % 30 % 40 % 50 % 69 % 



D 



1000 lacl molecules 




o % 



"i 1 1 r 

10 % 30 % 50 % 70 % 



Figure 4: Proportion of time the target site is occupied (y-axis) as a function of DMA crowding (x-axis) in the 
case of immobile obstacles. For each set of parameters, we performed 100 simulations. We used boxplots to 
represent the data due to the fact that the data is unimodal; see avvendix l"Cl In (a) the correlation between the 
crowding level on the DNA and the mean occupancy of the target site is low and indicates that the two measures 
are not correlated. Nevertheless, by removing the case of empty DNA (0% crowding), the correlation drops to 
—0.93 indicating high anti- correlation. 



observed (occupancy decreases at 0\ site). 



increases the amount of time spend in the vicinity of the target site (jWang et al 



The explanation for this result is that initia lly, increasing crowding 

~ 20121) . but when the increase 



is above a certain level the relative contribution of the return time to the target site (which is the same as 
the arrival time) is higher. In other words, increasing the level of molecular crowdin g on the DNA up to a 
certain value has a positive effect (by increasing occupancy of the site) as proposed in (jWang et al. . 2012 ). but 
then this effect is diminished by significantly increasing the return time. In addition, this initial increase in the 
occupancy, when non-cognate molecules are added into the system, is more pronounced for low abundant TFs; 
compare Figure IDJa) to Figure @|6) • This indicates that the mechanism of keeping the TF in the vicinity of the 
target site aids mainly the extremely low abundant TFs and even leaky expressed TFs, but the effect becomes 
negligible for TFs that have at least 10 copies. 



4 Discussion 



The influence that molecular crowding has on gene regulation has been considere d only in a few pr eviou s studies . 
Thes e studies mainly focused on the mean arriv al time to the target site (such as ( Murugan . 2010l ) and ( Li et al 



2009)) or variability of target site occupancy ( Sasson et al. . 2012). Although these works provided analytical 



8 



solutio ns on this issue, they did not consider the case of 'mobile obstacles' on the DNA ([Zabet and Adrvan , 
2012bj). Here, we performed stochastic simulations where each molecule was explicitly represented, thus allowing 
an assessment of the influence of mobile versus fixed obstacles. 

Our results show that, for both mobile and immobile obstacles, molecular crowding on the D NA (non- 
cogna te TFs) increases the arrival time of cognate TFs to their target site as previously proposed bv iLi et ah 
(2009). Furthermore, crowding on the DNA leads to a reduction in the proportion of time the target site is 
occupied. This may be an important feedback mechanism in cases where the genes encode DNA-binding proteins 
(resulting in a negative feedback). Analogously, in the case of repressing transcription factors, if the repression 
is achieved by blocking the binding of the RNA polymerase to the promoters of genes, then an increase in 
crowding on the DNA would lead to further repression (again, resulting in a negative feedback). 

Crowding causes a reduction in occupancy and at the same time an increase in variability of the occupancy 
state across the population. Note that the variability here refers to population level variability and not time 
fluctuations, i.e., each simulation considers an independent 'virtual' cell. This means that a cell that has a lower 
number of DNA-binding proteins may display a finer control on gene regulation and less gene regulation noise. 
In order to get more local control on gene regulation, lower crowding on the DNA is required, but crowding on 
the DNA in unavoidable. Hence, when the cell grows too much (in the sense of overall protein production) and 
the DNA gets overcrowded, the noise in gene regulation reduces the fitness of the cell, an aspect which can be 
compensated only if the cognate TF abundance increases as well. This indicates that when the cognate TFs are 
a fixed percentage of the total abundance of DNA-binding proteins, there is a optimal l evel of cr o wding above 
which the noise in gene regulation becomes harmful for the cell (similar to the results of ILi et al.l ( 20091 ) V 



In the case of mobile obstacles, Figures [T] and [3] show that the crowding on the DNA has only a relatively 
small effect on both search time and occupancy of the target site. In the case of immobile obstacles on the 
DNA, both Figure [5] and Figure H] provide evidence for new results. Although the same trends are observed as 
in the case of mobile obstacles, there is a more dramatic change with increasing crowding (a higher and steeper 
shift). In addition, the distribution of the arrival times is bimodal for biological relevant levels of crowding. This 
means that one cannot use the mean or variance of the arrival time to describe the effects on gene regulation 
(as done previously), but rather investigate the actual binomial distribution of the occupancy. This bimodal 
distribution in the arrival times has limited effect on the proportion of time the target site is occupied, when 
averaging over the entire cell cycle, in the sense that the occupancy of the target site become unimodal. 

This drastic change and bimodality is a consequence of barriers forming in the vicinity of the target site 
( Ruusala and Crothersl . 1992 ; Hammar et al. . 2012 ; Wang et al. . 20121) . More specifically, for low crowding on 
the DNA, the probability of a barrier forming in the vicinity of the target site is relatively low, while for high 
crowding on the DNA, the probability of a barrier forming in the vicinity of the target site is relatively high. 
For medium levels of crowding, there are two subgroups, namely: (i) systems where barriers are formed and 
(ii) systems where barriers are not formed. 

In this context, one might ask whether highly abundant fixed obstacles on the DNA really exist. In 
bacterial cells, given the high specificity of some TFs, we expect that a subset of th e TFs would potentially 
creat e these immobile barriers. However, given the low abundance of bacterial TFs (|Wunderlich and Mirny . 
2009), the position where these immobile obstacles emerge is encoded into the DNA. Thus, we cannot make a 
general statement regarding the molecular crowding on the DNA, but rather this needs a systematic analysis 
for each particular promoter region. 

Alternatively, barriers can form o n the DNA when there is strong direct TF-TF cooperativity, which will 
lead to cluster formation on the DNA (IChu et all . (200^1 . This effect is removed when non-cognate TFs (that 
do not display direct TF-TF cooperativity) are present in the cell, but it is always the case that molecules that 
do not display cooperativity will be bound to the DNA. 

Finally, the presence of nucleosomes on the DNA could be responsible for these barriers, but this is 
particular only for eukaryotic systems and there is s till no clear ev idence that facilitated d iffusion exists in 
eukaryotic cells ( Vukoievic et al. . 20101 : Gehrind . 2011 ): discussed in ( Zabet and AdrvanL 2012b ). 

Often it is assumed that there is a direct relationship between binding site occupancy and expression 
level. We show that the variability in occupancy is not negligible and depends on the number of non-cognate 
molecules bound to the DNA. This variability can be observed between cells, is independent from fluctuations in 
the TF abundances (cognate or non-cognate), but arises from the facilitated diffusion mechanism and depends 
on crowding. In this context, the omission of variations in occupancy of the cis-regulatory region or wrong 
assumptions about its extent can generate misleading results when investigating the sources of noise in gene 
expression. 

Genetic research and synthetic biology often employ experiments where the abundances of one or several 
TFs are changed significantly (either completely knocked down or significantly over-expressed). The general 
assumption is that only the genes that are directly regulated by the corresponding TFs (and to some extent their 



9 



parameter 


laci 


non-cognate 


immobile non-cognate 


notation 


copy number 


a /1 in inn innnl 

t \L, 1U, 1UU, 1UUU| 


see Table 1 in main manuscript 


1 fx 


motif sequence 


see Table [3] 








energetic penalty for mismatch 


1 K B T 


13 K B T 


13 K B T 


S X 


nucleotides covered on left 


bp 


23 bp 


23 bp 


r n 77'lcit 
1 b x 


nucleotides covered on right 


bp 


23 bp 


23 bp 




association rate to the DNA 


see Table 1 in main manuscript 


/ ,assoc 
^x 


unbinding probability 


0.001474111 


0.001474111 


1.0 


punbind 


probability to slide left 


0.4992629 


0.4992629 


0.0 


plctt 

X 


probability to slide right 


0.4992629 


0.4992629 


0.0 


pright 

cc 


probability to dissociate com- 


0.1675 


0.1675 


1.0 


± X 


pletely when unbinding 










time bound at the target site 


1.18£-6 s 


0.3314193 s 


13.2e + 09 


^0 
T 

' X 


the size of a step to left 


1 bp 


1 bp 


1 bp 




the size of a step to right 


1 bp 


1 bp 


1 bp 




variance of repositioning dis- 


1 bp 


1 bp 


1 bp 


>> 

CT hop 


tance alter a nop 








the distance over which a hop be- 


100 bp 


100 bp 


100 bp 


1 

"-jump 


comes a jump 










the proportion of prebound 


0.0 


0.9 


0.9 




molecules 










affinity landscape roughness 




1.0 K B T 


1.0 K B T 





Table 2: TF species default parameters 



downstream targets) will be affected by this change. Nevertheless, significant change in the overall abundance 
of DNA-binding proteins can lead to change of the crowding on the DNA. Our study suggests that, in that 
case, the activity state of all genes will be affected by the changed degree of crowding. It can be assumed 
that evolution has come up with compensatory mechanisms that guarantee stable genomic expression levels, or 
that the degree of crowding must change significantly (beyond what is biologically feasible) for these effects to 
be measurable. This is where stochastic simulations can only inform us of theoretical possibilities, but where 
ultimately biological experiments are required. 



5 Acknowledgments 

We would like to thank Mark Calleja for his support with configuring our simulations to run on CamGrid and 
Rob Foy, Robert Stojnic and Daphne Ezer for useful discussions and comments on the manuscript. 

Funding: This work was supported by Medical Research Council [G1002110 to N.R.Z.]. B.A. is a Royal 
Society University Research Fellow. 

APPENDIX 



A TF parameters 



The default parameters used here were previously derived in ( Zabet and Adrvan L 2012aF) a nd ( Zabe 



.et, 
m 



and 



we 



are listed in Table O In order to compare our results to the ones of lLi et al.l ( 20091 ) and Muruga: 
considered also a pseudo-immobile non-cognate TF species. We assumed that this species does not diffuse on 
the DNA (hence the P}^ = P^f* = 0.0, P™ bind = 1.0 and P^ np = 1.0). In order to increase the the amount 
of time spent at one position, we T® nc = 13.2e + 09. We derived this waiting time by assuming that if the affinity 
landscape has a mean binding energy of 13 K B T and we aim to keep the molecules bound to the DNA 30000 s 
(ten times the length of the simulation), then (jZabet and Adrvanl 2012a ) 



30000 = r? nc exp (-13) r° nc = 30000 ■ exp (13) w 13.2e + 09 



(1) 



The P WM of the laci was presented in (jZabeti . 120121 ) and is also listed in Table [H 



10 





PWM 


— — : — : 

Position 


A 

A 


V-j 




1 


i 
i 


U.OzUU 


n ftonn 
—u.oyuu 


n i /inn 
U.14UU 


n ftonn 

—u.oyuu 


o 
z 


U.OzUU 


n ftonn 

—u.oyuu 


n i /inn 
U.14UU 


n ftonn 

—u.oyuu 


Q 
O 


U.iOUU 


n 1 /inn 
U.14UU 


n ftonn 

—u.oyuu 


n 1 onn 
U.loUU 


A 

4 


U.iOUU 


n ftonn 

—u.oyuu 


n ftonn 

—u.oyuu 


n ftonn 
U.oZUU 


c 



n 7nnn 
— U. (UUU 


— U. f UUU 


n nnnn 

u.yuuu 


n 7nnn 
— U. f UUU 


D 


— u.oyuu 


n ftonn 

—u.oyuu 


n ftonn 

—u.oyuu 


n OQAA 

U.yoUU 


7 
1 


U.UU ( ( 


— U.UUo4 


—U.UU ^ o 


n nnQQ 
U.UUoo 


Q 

o 


U.UU ( ( 


— U.UUo4 


—U.UU ^ o 


n nnQQ 
U.UUoo 


y 


U.UU ( ( 


— U.UUo4 


—U.UU / o 


U.UUoo 


1 n 


n nn77 
U.UU ( ( 


— U.UUo4 


—U.UU / o 


U.UUoo 


1 1 
1 1 


n nfl'7'7 

U.UU ( ( 


— U.UUo4 


—U.UU ^ ^) 


n nnQQ 
U.UUoo 


1Z 


n nfl'7'7 

U.UU ( ( 


— U.UUo4 


—U.UU ^ ^) 


n nnQQ 
U.UUoo 


lo 


n nfl'7'7 

U.UU ( ( 


— U.UUo4 


—U.UU / o 


n nnQQ 
U.UUoo 


1 A 
14 


n nn77 

U.UU ( ( 


— U.UUo4 


—U.UU / o 


n nnQQ 
U.UUoo 




U.UU ( ( 


— U.UUo4 


— U.UU ^ o 


n nnQQ 
U.UUoo 


1 ft 
10 


U.DZUU 


n ftonn 
— u.oyuu 


n 1 /inn 

U. 14UU 


n ftQnn 
— u.oyuu 


1 7 


-0.7000 


0.9000 


-0.7000 


-0.7000 


18 


0.9300 


-0.6900 


-0.6900 


-0.6900 


19 


0.9300 


-0.6900 


-0.6900 


-0.6900 


20 


-0.6900 


0.1400 


-0.6900 


0.6200 


21 


-0.6900 


0.1400 


-0.6900 


0.6200 



Table 3: lad PWM 



B Time to reach the target site in the case of mobile obstacles 



To test whether th e distribu t ion o f the arrival time s to the target site is b i moda l or unimodal, we used an 
R implementation ( Maechler . 2012 ) of the Dip test ( Ha rtiean and Hartieanl fl985") . Figure [5] shows that the 
distribution of the arrival time is unimodal (with p- values higher than 0.1) only for mobile obstacles. In the 
case of immobile obstacles and 30 — 50% of the DNA being covered by DNA binding proteins, the distribution 
becomes bimodal. This means that the arrival time to the target site can be represented using boxplots for 
mobile obstacles, but needs to be represented with violin plots for immobile obstacles. 

In addition, in the main manuscript, we mentioned that the variability of the search time is only negligibly 
influenced by biological relevant levels of crowding on the DNA. Figure [H] confirms this result. 



C Proportion of time the target site is occupied in the case of mobile 
obstacles 

To test the unimodality of the proportion of time the target site is occupied, we used again the Dip test. Figure 
[7]shows that the distribution is unimodal (with p- values higher than 0.1) for both mobile or immobile obstacles. 
Thus, the proportion of time spent at the target site can be represented using boxplots. 

We also looked at the noise in occupancy and found that indeed, there is a strong correlation between 
crowding levels on the DNA and noise in the proportion of time the target site is occupied. In particular, we 
found that by increasing the level of crowding on the DNA the noise in the occupancy of the target site is also 
increased; see Figure Interestingly, this is valid for both mobile (Figure [5]) and immobile obstacles (Figure 
U). 



References 

Bar-Even, A., Paulsson, J., Maheshri, N., Carmi, M., O'Shea, E., Pilpel, Y., and Barkai, N. (2006). Noise in 
protein expression scales with natural protein abundance. Nature Genetics, 38(6):636-643. 

Berg, O. G., Winter, R. B., and von Hippel, P. H. (1981). Diffusion-driven mechanisms of protein translocation 
on nucleic acids. 1. models and theory. Biochemistry, 20(24) :6929-6948. 



11 



A mobile obstacles B immobile obstacles 




"i 1 1 1 1 1 n 1 1 1 1 1 1 r 

10 20 30 40 50 10 20 30 40 50 60 70 



% of covered DNA % of covered DNA 

Figure 5: The dip test of unimodality for the the average time for the TF to reach the target site (measured in 
seconds) as a function of DNA crowding. This graph plots the p- value that the distribution is bimodal when 
applying the dip test. The dashed line indicates a threshold p- value of 0.1. There are two cases: (a) mobile 
obstacles and (6) immobile obstacles. 

Bonnet, I., Biebricher, A., Porte, P.-L., Loverdo, C, Benichou, O., Voituriez, R., Escude, C, Wende, W., 
Pingoud, A., and Desbiolles, P. (2008). Sliding and jumping of single EcoRV restriction enzymes on non- 
cognate DNA. Nucleic Acids Research, 36(12):4118-4127. 

Chu, D., Zabet, N. R., and Mitavskiy, B. (2009). Models of transcription factor binding: Sensitivity of activation 
functions to model assumptions. Journal of Theoretical Biology, 257(3):419-429. 

Elf, J., Li, G.-W., and Xie, X. S. (2007). Probing transcription factor dynamics at the single-molecule level in 
a living cell. Science, 316:1191-1194. 

Flyvbjerg, H., Keatch, S. A., and Dryden, D. T. (2006). Strong physical constraints on sequence-specific target 
location by proteins on DNA molecules. Nucleic Acids Research, 34(9):2550-2557. 

Gehring, W. J. (2011). How do hox transcription factors find their target genes in the nucleus of living cells? 
Biologie Aujourd'hui, 205(2) :75-85. 

Gcrland, U., Moroz, J. D., and Hwa, T. (2002). Physical constraints and functional characteristics of transcrip- 
tion factor-DNA interactions. PNAS, 99(19):12015-12020. 

Gillespie, D. T. (1976). A general method for numerically simulating the stochastic time evolution of coupled 
chemical reactions. Journal of Computational Physics, 22(4):403-434. 

Gillespie, D. T. (1977). Exact stochastic simulation of coupled chemical reactions. The Journal of Physical 
Chemistry, 81:2340-2361. 

Halford, S. E. and Marko, J. F. (2004). How do site-specific DNA-binding proteins find their targets? Nucleic 
Acids Research, 32(10):3040-3052. 

Hammar, P., Leroy, P., Mahmutovic, A., Marklund, E. G., Berg, O. G., and Elf, J. (2012). The lac repressor 
displays facilitated diffusion in living cells. Science, 336(6088):1595-1598. 

Hartigan, J. A. and Hartigan, P. M. (1985). The dip test of unimodality. Annals of Statistics, 13(l):70-84. 

Hermsen, R., Tans, S., and ten Wolde, P. R. (2006). Transcriptional regulation by competing transcription 
factor modules. PLoS Computational Biology, 2:1552-1560. 

Kabata, H., Kurosawa, O., I Arai, M. W., Margarson, S., Glass, R., and Shimamoto, N. (1993). Visualization 
of single molecules of rna polymerase sliding along dna. Science, 262(5139):1561-1563. 

Li, G.-W., Berg, O. G., and Elf, J. (2009). Effects of macromolecular crowding and dna looping on gene 
regulation kinetics. Nature Physics, 5:294 - 297. 

Maechler, M. (2012). diptest: Hartigan's dip test statistic for unimodality code. R package version 0.75-4. 



12 



1 lacl molecule 



B 



10 lacl molecules 



oo 
o 




o 



o 
oo 



o 




100 lacl molecules 



o 




o 
o 



o 

CT> 
O 



O 

oo 



o 



D 



1000 lacl molecules 




% of covered DNA 



% of covered DNA 



Figure 6: Noise in search time as a function of the crowding levels on the DNA. We consider ed that the nois e 
in search time is the variance normalised by square mean as it was previously supported by ( Paulsson , 2005I) . 
The number in the inset represents the Pearson coefficient of correlation between crowding and the noise of the 
search time. 



13 



A mobile obstacles B immobile obstacles 




% of covered DNA % of covered DNA 

Figure 7: The dip test of unimodality for the proportion of time the target site is occupied as a function of the 
crowding levels on the DNA. The graph plots the p- value that the distribution is bimodal when applying the 
dip test. The dashed line indicates a threshold p-value of 0.1. There are two cases: (a) mobile obstacles and 
(6) immobile obstacles. 



Mirny, L., Slutsky, M., Wunderlich, Z., Tafvizi, A., Leith, J., and Kosmrlj, A. (2009). How a protein searches for 
its site on DNA: the mechanism of facilitated diffusion. Journal of Physics A: Mathematical and Theoretical, 
42:434013. 

Murugan, R. (2010). Theory of site-specific interactions of the combinatorial transcription factors with dna. 
Journal of Physics A: Mathematical and Theoretical, 43:195003. 

Paulsson, J. (2005). Models of stochastic gene expression. Physical Life Reviews, 2:157-175. 

Riggs, A. D., Bourgeois, S., and Cohn, M. (1970). The lac represser-operator interaction: III. kinetic studies. 
Journal of Molecular Biology, 53(3):401-417. 

Riley, M., Abe, T., Arnaud, M. B., Berlyn, M. K., Blattner, F. R., Chaudhuri, R. R., Glasner, J. D., Horiuchi, 
T., Keseler, I. M., Kosuge, T., Mori, H., Perna, N. T., Plunkett, G., Rudd, K. E., Serres, M. H., Thomas, 
G. H., Thomson, N. R., Wishart, D., and Wanner, B. L. (2006). Escherichia coli k-12: a cooperatively 
developed annotation snapshot - 2005. Nucleic Acids Research, 34(l):l-9. 

Rosenfeld, N., Young, J. W., Alon, U., Swain, P. S., and Elowitz, M. B. (2005). Gene regulation at the single-cell 
level. Science, 307(5717):1962-1965. 

Ruusala, T. and Crothers, D. M. (1992). Sliding and intermolecular transfer of the lac repressor: kinetic 
perturbation of a reaction intermediate by a distant dna sequence. PNAS, 89(ll):4903-4907. 

Sasson, V., Shachrai, L, Bren, A., Dekel, E., and Alon, U. (2012). Mode of regulation and the insulation of 
bacterial gene expression. Molecular Cell, 46(4):399-407. 

Stormo, G. D. (2000). DNA binding sites: representation and discovery. Bioinformatics, 16(l):16-23. 

van Zon, J. S., Morelli, M. J., Tanase-Nicola, S., and ten Wolde, P. R. (2006). Diffusion of transcription factors 
can drastically enhance the noise in gene expression. Biophysical Journal, 91:4350-4367. 

Vukojevic, V., Papadopoulos, D. K., Terenius, L., Gehring, W. J., and Rigler, R. (2010). Quantitative study of 
synthetic hox transcription factor-dna interactions in live cells. PNAS, 107(9) :4093-4098. 

Wang, S., Elf, J., Hellander, S., and Lotstedt, P. (2012). Stochastic reaction-diffusion processes with embedded 
lower dimensional structures. Technical report, Department of Information Technology, Uppsala University. 

Wasson, T. and Hartemink, A. J. (2009). An ensemble model of competitive multi-factor binding of the genome. 
Genome Research, 19:2101-2112. 

Wunderlich, Z. and Mirny, L. A. (2008). Spatial effects on the speed and reliability of protein-DNA search. 
Nucleic Acids Research, 36(ll):3570-3578. 



14 



A 1 lacl molecule 



B 10 lacl molecules 




% of covered DNA % of covered DNA 

Figure 8: Noise in the proportion of time the target site is occupied as a function of the crowding levels on the 
DNA in the case of mobile obstacles. Again we normalised the variance by the square of the mean. The number 
in the inset represents the Pearson coefficient of correlation between crowding and the noise in the occupancy 
time. 



15 



A 1 lacl molecule B 10 lacl molecules 




% of covered DNA % of covered DNA 



Figure 9: Noise in the proportion of time the target site is occupied as a function of the crowding levels on the 
DNA in the case of immobile obstacles. We normalised the variance by the square of the mean. The number 
in the inset represents the Pearson coefficient of correlation between crowding and the noise in the occupancy 
time. 



16 



Wunderlich, Z. and Mirny, L. A. (2009). Different gene regulation strategies revealed by analysis of binding 
motifs. Trends in Genetics, 25(10):434-440. 

Zabet, N. R. (2012). System size reduction in stochastic simulations of the facilitated diffusion mechanism. 
BMC Systems Biology, 6(1): 121. 

Zabet, N. R. and Adryan, B. (2012a). A comprehensive computational model of facilitated diffusion in prokary- 
otes. Bioinformatics, 28(11):1517-1524. 

Zabet, N. R. and Adryan, B. (2012b). Computational models for large-scale simulations of facilitated diffusion. 
Molecular BioSystems, 8(ll):2815-2827. 

Zabet, N. R. and Adryan, B. (2012c). GRiP: a computational tool to simulate transcription factor binding in 
prokaryotes. Bioinformatics, 28(9):1287-1289. 

Zabet, N. R. and Chu, D. F. (2010). Computational limits to binary genes. Journal of Royal Society Interface., 
7:945-954. 



17 



